Using the United Nation’s Human Development Index (HDI) data to produce maps and other charts.
The Economist published this map in September 2022 in order to highlight that ‘living standards are moving in the wrong direction’. It is striking how most countries in the world have worsened their scores in the Human Development Index (HDI) between 2019 and 2021, which makes the map mainly red. The HDI, calculated by the United Nations since 1990, is a summary measure of average achievement in three key dimensions of human development: a long and healthy life, being knowledgeable and have a decent standard of living.
The aim of this tutorial is to reproduce the map and propose an alternative visualization using the same dataset.
All the data needed for this map can be downloaded directly from the UNDP website. https://hdr.undp.org/data-center/documentation-and-downloads They provide a complete data set with the latest ranking (2021), the index value for each country since 1990 and its components. Since the map only shows the change between 2019 and 2021, I select a subset of the data to compute the growth change in the required period, the country name and the ISO3 code.
To draw the world map, the giscoR packaged is used to get the geographic information of each country in an sf format, which is required to use geom_sf in ggplot.
world <- giscoR::gisco_get_countries()
class(world)
[1] "sf" "data.frame"
Now, the geographical information and the HDI data can be joined by the ISO3 code to form the complete dataset. This dataset must also be class sf, so I use st_as_sf to keep this format. Then, the Antarctica is removed from the final data, since this region does not appear in the original map.
#checking that both data sets contain the same countries
hdi_data <- HDI %>% distinct(iso3) %>% as_vector()
nat_data <- world$ISO3_CODE
differ <-as.data.frame(setdiff(hdi_data, nat_data))
all_data <- st_as_sf(left_join(world, my_data, by = c("ISO3_CODE" = "iso3")))
all_data <- all_data %>% filter(!ISO3_CODE %in% c("ATA"))
summary(all_data)
CNTR_ID NAME_ENGL ISO3_CODE
Length:256 Length:256 Length:256
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
CNTR_NAME FID country
Length:256 Length:256 Length:256
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
hdi_2019 hdi_2021 change
Min. :0.3930 Min. :0.3850 Min. :-5.2000
1st Qu.:0.6105 1st Qu.:0.5995 1st Qu.:-1.5000
Median :0.7460 Median :0.7390 Median :-0.9000
Mean :0.7275 Mean :0.7206 Mean :-0.9838
3rd Qu.:0.8435 3rd Qu.:0.8350 3rd Qu.:-0.3000
Max. :0.9620 Max. :0.9620 Max. : 2.6000
NA's :65 NA's :65 NA's :65
geometry
MULTIPOLYGON :134
POLYGON :122
epsg:4326 : 0
+proj=long...: 0
To start with, the map is plotted by using geom_sf, which automatically interprets the geographical data. The fill aesthetic is set to the variable “change”, and the proper scale fill is added to fix the same breaks and colors than The Economist. The guide for it is guide_colorsteps. Color is fixed in white and the size in 0.1 to generate the lines for countries’ contour. Theme_map is added, since it sets all chart elements in blank but for the map.
m <- ggplot(data = all_data) +
geom_sf(aes(fill = change), color = "white", size = 0.1) +
scale_fill_stepsn(breaks = c(-4,-2,0,2),
colors = c("#be0f08", "#f6423c", "#ffa9a6", "#a4bfd6", "#1e5c98"),
na.value = "#e0ded4",
guide = guide_colorsteps(even.steps = TRUE,
order = 1,
ticks = TRUE,
ticks.colour= "black",
ticks.linewidth = 1,
direction = "horizontal",
title = NULL,
draw.llim = TRUE,
draw.ulim = TRUE,
frame.color = "black",
label.theme = element_text(size= 15))) +
theme_map()
m
Since the previous legend does not show the missing values, another legend has to be configured for them. To do so, I get a very small region (Gibraltar), and add it as another map layer with color aesthetic in blank. For this aes, a scale color is defined in which a guide_bins is added specifying the proper color in the override.aes argument.
Gibraltar <- all_data %>% filter(ISO3_CODE == "GIB")
m <- m + geom_sf(data = Gibraltar, aes(color = ""))+
scale_color_discrete(guide=guide_bins(
axis = FALSE,
label = FALSE,
title = "No data",
title.position = "right",
title.vjust = 1,
title.hjust = 0,
title.theme = element_text(size = 16,
color = "grey50"),
override.aes = list(color = "#e0ded4", fill = "#e0ded4"),
order = 2,
direction = "horizontal")) +
theme(legend.position = c(0.63,0.95),
legend.key = element_rect(color = "white"),
legend.key.height = unit(2.2,"mm"),
legend.key.width = unit(4.7,"mm"),
legend.margin = margin(0,0,0,0),
legend.direction = "horizontal",
legend.box = "horizontal",
legend.spacing = unit(4.7,"mm"),
legend.spacing.y = unit(1,"mm"),
legend.background = element_blank())
m
To add the black points place in the center of each of the countries annotated, we can filter them from the
dataset and get their geographical centers with st_point_on_surface
Another dataframe is created for Australia
Then, we can add them in two different layers
m <- m + geom_sf_text(data = australia, aes(label = paste(NAME_ENGL, "1.1")),
size = 4.7,
color = "#1e5c98")+
geom_sf(data = countries_center, size = 0.5)
m
To this point, all the geographic information available in our dataset has been used, but the map does not look the same than the original because by default the projection used by geom_sf is …… Since the one used by The Economist is the Robinson projection, the coordinate system of the map has to be changed in accordance.
After changing the projection, all the following annotations must be in the same Coordinate Reference System, so a new sf object is created with the standard coordinates (latitude and longitude) of each country to be annotated and other for the end of the segment linked to it. Then, they are transformed to the Robinson projection.
lebanon <- st_sfc(st_point(c(35.49,33.9)), crs=4326)
lebanon <- st_transform(lebanon, "+proj=robin")
lebanon_end <- st_sfc(st_point(c(-14,33.9)), crs=4326)
lebanon_end <- st_transform(lebanon_end, "+proj=robin")
namibia <- st_sfc(st_point(c(17.7,-23)), crs=4326)
namibia <- st_transform(namibia, "+proj=robin")
namibia_end <- st_sfc(st_point(c(8,-23)), crs=4326)
namibia_end <- st_transform(namibia_end, "+proj=robin")
venezuela <- st_sfc(st_point(c(-65,6.5)), crs=4326)
venezuela <- st_transform(venezuela, "+proj=robin")
venezuela_end <- st_sfc(st_point(c(-85,6.5)), crs=4326)
venezuela_end <- st_transform(venezuela_end, "+proj=robin")
bangladesh <- st_sfc(st_point(c(90,23.7)), crs=4326)
bangladesh <- st_transform(bangladesh, "+proj=robin")
bangladesh_end <- st_sfc(st_point(c(130,23.7)), crs=4326)
bangladesh_end <- st_transform(bangladesh_end, "+proj=robin")
With the elements above, segments and text annotations can be put in place using annotate. The name and the change value of each country are used as labels, and their corresponding colors are also added.
m <- m + annotate("segment", x=lebanon[[1]][1], y=lebanon[[1]][2],
xend=lebanon_end[[1]][1], yend=lebanon_end[[1]][2], size = 0.5) +
annotate("text", x=lebanon_end[[1]][1], y=lebanon_end[[1]][2],
label = "Lebanon -5.2",
hjust = 1.05,
color = "#be0f08",
size = 4.7) +
annotate("segment", x=namibia[[1]][1], y=namibia[[1]][2],
xend=namibia_end[[1]][1], yend=namibia_end[[1]][2], size = 0.5) +
annotate("text", x=namibia_end[[1]][1], y=namibia_end[[1]][2],
label = "Namibia -3.8",
hjust = 1.05,
color = "#f6423c",
size = 4.7)+
annotate("segment", x=venezuela[[1]][1], y=venezuela[[1]][2],
xend=venezuela_end[[1]][1], yend=venezuela_end[[1]][2], size = 0.5) +
annotate("text", x=venezuela_end[[1]][1], y=venezuela_end[[1]][2],
label = "Venezuela -4.2",
hjust = 1.05,
color = "#be0f08",
size = 4.7)+
annotate("segment", x=bangladesh[[1]][1], y=bangladesh[[1]][2],
xend=bangladesh_end[[1]][1], yend=bangladesh_end[[1]][2], size = 0.5)+
annotate("text", x=bangladesh_end[[1]][1], y=bangladesh_end[[1]][2],
label = "Bangladesh 2.6",
hjust = -0.05,
color = "#1e5c98",
size = 4.7)
m
Lastly, adding the proper title, subtitle and caption, and placing them in the same positions, the final map is achieved. The font used is “Open Sans”, which can be downloaded from Google Fonts.
m + labs(title = "What a time to be alive",
subtitle = "Change in UN Human Development Index, 2019-21, %",
caption = "Source: United Nations")+
theme(plot.title = element_text(face="bold",
family = "Open Sans",
size = 16,
hjust = 0.14,
margin = margin(10,0,5,0)),
plot.title.position = "panel",
plot.subtitle = element_text(size = 16,
family = "Open Sans",
hjust = 0.195,
margin = margin(0,0,1,0)),
plot.caption = element_text(size = 13.5,
family = "Open Sans",
hjust = 0.14,
color = "grey50",
margin = margin(0,0,0,0)),
plot.caption.position = "panel")
The Economist’s map shows just a narrow time span of the evolution of living standards, and it does not allow to see which areas are better off worldwide either. Since the data provided by UN does contain all this information, I want to produce a chart that convey regional disparities and temporal evolution.
Additionally, as the HDI index is also produce separately for men and women, it is possible to compute the difference between them for each country and year (if available). In this way, a measure of the gender gap is generated as HDI male - HDI female, whose evolution in time can also be tracked. In this way, a gender perspective is added to the alternative visualization.
The UN’s dataset is in a wide format, with years added to the name of each variable, having more than 1000 columns. In order to produce a tidy dataset with only the required information, I compute it by joining 3 different sub dataframes (main index, male and female) by country and year.
HDI<- HDI %>% rename(ranking_2021 = hdi_rank_2021)
p <- HDI %>%
select(matches("^(hdi)(_)([0-9]+)$"), iso3, country, ranking_2021, region, hdicode) %>%
pivot_longer(contains("hdi_"), names_sep = "_",
names_to = c("hdi", "year"), values_to = "value")
f <- HDI %>%
select(contains("hdi_f"), country) %>%
pivot_longer(contains("hdi_"), names_to = c("year"), values_to = "hdi_f") %>%
mutate(year = str_remove(year, "hdi_f_"))
m <- HDI %>%
select(contains("hdi_m"), country) %>%
pivot_longer(contains("hdi_"), names_to = c("year"), values_to = "hdi_m") %>%
mutate(year = str_remove(year, "hdi_m_"))
hditotal <- left_join(p,m, by = c("year"="year", "country" = "country"))
hditotal <- left_join(hditotal,f, by = c("year"="year", "country" = "country"))
Once the data is ready, I generate the gender gap variable and convert year to numeric. In addition, I remove the observations related to aggregated areas () and create a Region variable based on the original one used to classify developing regions, where missing values correspond to developed countries.
hditotal <- hditotal %>% mutate(f_m_diff = hdi_m - hdi_f,
year = as.numeric(year)) %>%
filter(!startsWith(iso3, "ZZ")) %>%
mutate(Region = case_when(
region == "AS" ~ "Arab States",
region == "EAP" ~ "East Asia & Pacific",
region == "ECA" ~ "Europe & Central Asia",
region == "LAC" ~ "Latin America & Caribbean",
region == "SA" ~ "South Asia",
region == "SSA" ~ "Sub-Saharan Africa",
TRUE ~ "Developed"))
a <- hditotal %>% filter(year == 1990) %>%
ggplot() +
scale_y_continuous(limits= c(0.2,1), breaks = hdi_levels)+
scale_x_continuous(limits = c(-0.08,0.28), expand = c(0,0))+
geom_text(aes(0.24, 0.9, label=year), size= 11, color = "gray")+
geom_hline(yintercept=c(0.5,0.7,0.8, 1), linetype="dashed", color = "grey")+
geom_vline(xintercept=c(0), linetype="dashed", color = "grey50")+
annotate("text", y=hdi_pos_levels, x=-0.08,
color="darkgray", vjust=1.3, hjust=0, size=5,
label=hdi_text_levels) +
xlab("HDI Gender gap")+
ylab("Human Development Index (HDI)")+
theme(text = element_text(family = "Open Sans"),
axis.line.y = element_blank(),
axis.line.x = element_line(color = "gray50"),
axis.ticks = element_blank(),
axis.text = element_text(size = 11),
axis.title = element_text(size = 14),
axis.text.y.left = element_text(margin = margin(10,10,10,10)),
panel.grid.major.x =element_line(linetype=1, color = "grey", size = 0.05),
panel.grid.minor = element_blank(),
panel.background = element_blank())+
geom_point(aes(y = value, x = f_m_diff, color = Region, size = hdi_f), alpha = 0.7) +
scale_color_brewer(palette = "Set1",
guide = guide_legend(title = "Region",
override.aes = list(size=5),
direction = "vertical",
title.vjust = 1,
title.hjust = 0,
title.position = "top",
order = 2,
label.theme = element_text(size=12)))+
scale_size_area(breaks = c(0.2,0.5,0.7,0.8, 1),
max_size = 5,
guide = guide_legend(title = "HDI female",
title.vjust = 1,
title.hjust = 0.5,
title.position = "top",
direction = "vertical",
label.position = "right",
order=1))+
#geom_point(aes(size=hdi_f, x = f_m_diff, y = value), color="#333333", shape=21)+
labs(title = "Human Development Index and Gender gaps",
subtitle = "Gender gap = HDI male - HDI female") +
theme(plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(face = "italic", size = 14),
plot.title.position = "plot",
legend.key = element_blank(),
legend.direction = "v")
a
countries <- c("Spain", "Afghanistan", "China", "Yemen", "India", "Niger", "Uruguay")
labels1990 <- hditotal %>% filter(country %in% countries, year == 1990)
a + geom_text(data = labels1990,aes(y = value, x = f_m_diff,label = country),
check_overlap = TRUE,
size = 4.5)
c <- ggplot(hditotal) +
scale_y_continuous(limits= c(0.2,1), breaks = hdi_levels)+
scale_x_continuous(limits = c(-0.08,0.28), expand = c(0,0))+
geom_text(aes(0.24, 0.9, label= as.factor(year)), size= 6, color = "gray")+
geom_hline(yintercept=c(0.5,0.7,0.8, 1), linetype="dashed", color = "grey")+
geom_vline(xintercept=c(0), linetype="dashed", color = "grey50")+
annotate("text", y=hdi_pos_levels, x=-0.08,
color="darkgray", vjust=1.3, hjust=0, size=3.5,
label=hdi_text_levels) +
xlab("HDI Gender gap")+
ylab("Human Development Index (HDI)")+
theme(text = element_text(family = "Open Sans"),
axis.line.y = element_blank(),
axis.line.x = element_line(color = "gray50"),
axis.ticks = element_blank(),
axis.text = element_text(size = 9),
axis.title = element_text(size = 10),
axis.text.y.left = element_text(margin = margin(10,10,10,10)),
panel.grid.major.x =element_line(linetype=1, color = "grey", size = 0.05),
panel.grid.minor = element_blank(),
panel.background = element_blank())+
geom_point(aes(y = value, x = f_m_diff, color = Region, size = hdi_f), alpha = 0.6) +
scale_color_brewer(palette = "Set1",
guide = guide_legend(title = "Region",
override.aes = list(size=6),
direction = "vertical",
title.vjust = 1,
title.hjust = 0,
title.position = "top",
order = 2,
label.theme = element_text(size=10)))+
scale_size_area(breaks = c(0.2,0.5,0.7,0.8, 1),
max_size = 6,
guide = guide_legend(title = "HDI female",
title.vjust = 1,
title.hjust = 0.5,
title.position = "top",
direction = "vertical",
label.position = "right",
order=1))+
labs(title = "Human Development Index and Gender gaps",
subtitle = "Gender gap = HDI male - HDI female") +
theme(plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(face = "italic", size = 11),
plot.title.position = "plot",
legend.key = element_blank(),
legend.direction = "v") +
transition_time(year) +
ease_aes('linear')
animate(c, height = 400, width = 600)